Skip to content

Remove all nemo2 imports from old repo#628

Open
oyilmaz-nvidia wants to merge 4 commits intomainfrom
fix/ruff-linting
Open

Remove all nemo2 imports from old repo#628
oyilmaz-nvidia wants to merge 4 commits intomainfrom
fix/ruff-linting

Conversation

@oyilmaz-nvidia
Copy link
Contributor

No description provided.

oyilmaz-nvidia and others added 4 commits March 3, 2026 16:41
… dynamic inference

- Add nemo_deploy/llm/inference/nemo_utils.py which vendors standalone NeMo
  utilities (MCoreTokenizerWrappper, ckpt path helpers, constants) with no
  dependency on the nemo package, and re-exports the complex NeMo types
  (GPTConfig, T5Config, io, set_modelopt_spec_if_exists_in_ckpt) under a
  single HAVE_NEMO guard.
- Remove direct from nemo.* imports from inference_base.py and tron_utils.py;
  both files now import from the local nemo_utils module instead.
- Fix AttributeError in create_mcore_engine: GPTInferenceWrapper was called
  with (model, inference_context) but the deployed Megatron-LM API expects
  (model, inference_wrapper_config, inference_context). Add InferenceWrapperConfig
  built from model.config attributes; MCoreEngine then internally creates a
  DynamicInferenceContext and switches to DynamicInferenceEngine.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix import ordering in test_inference_base.py (ruff I001)
- Remove direct nemo imports from inference_base.py, nemo_utils.py, tron_utils.py
- Add nemo_io.py with standalone load_context implementation
- Remove HAVE_NEMO guard checks now that nemo is no longer a static dependency
- Update tests to remove HAVE_NEMO patches and use types.SimpleNamespace
- Remove unused StaticInferenceContext import
- Use inner model config for hidden_size/params_dtype instead of outer model
- Add buffer_size_gb param to create_mcore_engine and MegatronLLMDeployable

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant